Skip to content

Feature/vector abstraction#1

Open
phlealr wants to merge 6 commits into
senecajs:mainfrom
phlealr:feature/vector-abstraction
Open

Feature/vector abstraction#1
phlealr wants to merge 6 commits into
senecajs:mainfrom
phlealr:feature/vector-abstraction

Conversation

@phlealr

@phlealr phlealr commented May 29, 2026

Copy link
Copy Markdown

Notes: I tried to deal with postgres the same way the store-postgres works, not sure if it was the best way
I added 2 new params in the options besides the pre existing in openSearch one; the pg one that receives credentials and the driver one so the caller can choose what driver to use.

i made some updates on yml so let me know if i shouldn't

What's in here

  • Driver abstraction — a small Driver interface (connect/close/upsert/get/query/remove/removeQuery) plus a name→driver registry. The plugin
    translates Seneca entity messages into driver calls and maps results back into entity shape.
  • pgvector driver (the working local backend) — KNN via cosine distance (1 - (embedding <=> $1) score in [0,1]), equality filters
    AND-combined with similarity, parameterised SQL with an identifier guard against injection. Schema is not managed by the plugin (documented
    DDL).
  • opensearch driver — kept from the original template as a second driver, proving the abstraction isn't vendor-shaped.
  • Per-canon vector dim declaration (canon: { 'doc/chunk': { vector: { dim } } }) with a mismatch check on save.

Connection lifecycle fix

The plugin originally opened the DB connection from a seneca-promisify prepare hook. That caused the store's close cmd to fire spuriously
during plugin init, tearing the pool back down — so every save$/load$/list$ threw connect() has not been called. The mock-driver tests
didn't catch it (mock methods don't need a live connection) and the pgvector tests bypassed Seneca.

Now the connection is opened from the canonical store init action (seneca.add({ init: store.name, tag: meta.tag }, …)), exactly like
seneca-postgres-store: Seneca runs it during ready(), before routing store messages, so the connection is live before the first op; the
store close cmd ends the pool.

Testing

Two-level suite:

  • Translation layer (test/VectorStore.test.ts) — a MockDriver verifies the Seneca↔driver translation with no backend.
  • Driver integration (test/driver/PgvectorDriver.test.ts) — the pgvector SQL/KNN behaviour directly.

A local pgvector is provided via Docker Compose:

docker compose up -d
export SENECA_VECTOR_PG_URL=postgres://postgres:postgres@localhost:5432/postgres
npm test

The translation tests run with no backend; the pgvector tests skip cleanly unless SENECA_VECTOR_PG_URL is set. 52 tests pass with the DB;
25 pass + the rest skip without it.

phlealr and others added 6 commits May 28, 2026 19:36
PR1 of the vector-store refactor. Introduces a Driver abstraction
(src/driver/Driver.ts) and re-wraps the existing OpenSearch logic as the
'opensearch' driver (src/driver/OpensearchDriver.ts) behind a vendor-agnostic
plugin shell (src/VectorStore.ts). Adapting OpenSearch was minimal: AwsSigv4Signer
setup, save/load/list/remove handlers, and buildQuery preserved as-is.

Caller selects the driver via driver: '<name>' (or the new DriverName enum).
Future drivers (pgvector in PR2) need a new file under src/driver/ + one line in
the internal registry + one enum entry — the Record<DriverName, ...> shape
forces all three to stay in sync at compile time.

Adds:
- docker-compose.yml with pgvector/pgvector:pg16 (for PR2)
- pg + @types/pg deps (other deps unchanged)
- VectorStore.test.ts smoke + utils tests (11 passing)
- Backward-compat alias: options.index continues to work as options.table

Existing OpenSearch tests stay env-var gated against live AWS (skip cleanly
without SENECA_OPENSEARCH_TEST_NODE / _INDEX).

CI: matrix narrowed to ubuntu-latest + Node 20/22, with a pgvector service for
the upcoming PR2 integration tests. README is unchanged in this commit and will
be rewritten alongside the pgvector driver work.

Package renamed: @seneca/opensearch-store -> @seneca/vector-store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PgvectorDriver implements the full Driver contract:
- connect/close over a pg Pool
- upsert (INSERT ... ON CONFLICT), get (metadata only, no embedding)
- query: KNN cosine via the <=> operator (score = 1 - distance), equality
  filters over jsonb, filter-only, and empty paths
- remove / removeQuery (DELETE by id / by filter — the all$ path)
Vectors are encoded with pgvector.toSql; LIMIT is parameterized; table and
filter-key identifiers pass a strict regex guard; ids are generated per-driver
when the caller omits one.

Plugin: pgvector registered in the driver registry + DriverName enum; save
validates the vector dim per canon.

Tests restructured into two layers that match what this plugin actually owns:
- translation (test/VectorStore.test.ts, MockDriver, no DB): verifies the
  plugin maps the Seneca entity API <-> Driver interface correctly — save
  strips id/vector into metadata, dim validation, load mapping, list filter
  partitioning + vector$ directive + custom$.score, remove/removeQuery dispatch.
- driver integration (test/driver/PgvectorDriver.test.ts, real pgvector via
  docker): SQL/KNN ordering/score range/encoding/delete/identifier guards.
OpenSearch live-AWS tests stay env-gated (skip without credentials).

Stop tracking dist/ (now gitignored; it is a build artifact).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Generalize the OpenSearch template README to the vendor-agnostic vector
store: same Seneca entity API across drivers, driver selected via the
`driver` option. Documents the driver registry (opensearch, pgvector),
vector save / KNN similarity (directive$.vector$) / equality filters /
remove, pgvector SQL setup, per-driver differences, and the docker-compose
test workflow. Keeps the original README's structure and tone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The plugin opened the DB connection from a seneca-promisify `prepare`
hook, which caused the store's `close` cmd to fire spuriously during
plugin init — tearing the pool back down so every save$/load$/list$
threw "connect() has not been called". The mock-driver tests missed it
(MockDriver methods don't need a live connection) and the pgvector
tests bypassed Seneca entirely.

Open the connection from the canonical store init action instead
(`seneca.add({init: store.name, tag: meta.tag}, ...)`), matching
seneca-postgres-store: Seneca runs it during ready(), before routing
store messages, so the connection is live before the first op. The
store `close` cmd ends the pool.

Also:
- Driver.remove/removeQuery are now required (every driver implements
  them); drop the "driver does not support remove" guards and the
  MockDriverNoRemove test scaffolding.
- Add test/VectorStorePg.test.ts: exercises the abstraction (the Seneca
  entity API) against a real pgvector backend, gated on
  SENECA_VECTOR_PG_URL. This is what the task asked for and what was
  missing — it now passes (and originally exposed the lifecycle bug).

52 tests pass with SENECA_VECTOR_PG_URL set; 25 pass + the rest skip
cleanly without it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
senecajs/todo-to-issue-action@master fails on every push (urllib3 TypeError
in the action's Docker image; stale inputs). Disable the workflow rather than
run a broken action: manual-only trigger + an always-skipped job, with a
header comment explaining the breakage and how to re-enable via the
maintained upstream alstr/todo-to-issue-action@v5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant